Association analysis for resistance to Striga hermonthica in diverse tropical maize inbred lines

Striga hermonthica is a widespread, destructive parasitic plant that causes substantial yield loss to maize productivity in sub-Saharan Africa. Under severe Striga infestation, yield losses can range from 60 to 100% resulting in abandonment of farmers’ lands. Diverse methods have been proposed for Striga management; however, host plant resistance is considered the most effective and affordable to small-scale famers. Thus, conducting a genome-wide association study to identify quantitative trait nucleotides controlling S. hermonthica resistance and mining of relevant candidate genes will expedite the improvement of Striga resistance breeding through marker-assisted breeding. For this study, 150 diverse maize inbred lines were evaluated under Striga infested and non-infested conditions for two years and genotyped using the genotyping-by-sequencing platform. Heritability estimates of Striga damage ratings, emerged Striga plants and grain yield, hereafter referred to as Striga resistance-related traits, were high under Striga infested condition. The mixed linear model (MLM) identified thirty SNPs associated with the three Striga resistance-related traits based on the multi-locus approaches (mrMLM, FASTmrMLM, FASTmrEMMA and pLARmEB). These SNPs explained up to 14% of the total phenotypic variation. Under non-infested condition, four SNPs were associated with grain yield, and these SNPs explained up to 17% of the total phenotypic variation. Gene annotation of significant SNPs identified candidate genes (Leucine-rich repeats, putative disease resistance protein and VQ proteins) with functions related to plant growth, development, and defense mechanisms. The marker-effect prediction was able to identify alleles responsible for predicting high yield and low Striga damage rating in the breeding panel. This study provides valuable insight for marker validation and deployment for Striga resistance breeding in maize.

www.nature.com/scientificreports/ Several control methods have been proposed for Striga management, including host plant resistance, cultural, chemical and manual control options. However, integrated Striga management approach is considered the most economical and affordable for small-scale farmers who cannot afford high inputs control options. The approach involves the combination of two or more control options. Host plant resistance is considered a cost effective, environmental feasible and affordable option for smallholder farmers. It is also an essential component of any successful integrated approach for controlling Striga parasitism. Several studies have shown progress in breeding for natural genetic resistance to Striga in maize 5 . In addition, extensive research has been done to map quantitative trait loci (QTLs) for Striga resistance in maize using molecular markers. QTL mapping and genome-wide association study (GWAS) are two methods widely used to discover genetic loci controlling complex traits. Quantitative trait loci (QTLs) associated with Striga hermonthica have been identified in maize using QTL mapping approach [9][10][11] . Badu-Apraku et al. 9 identified 12 QTLs associated with four Striga resistance/tolerance traits in maize and these QTLs explained 3.2 to 34.9% of the phenotypic variation. However, the QTL mapping approach have several limitations, for example, it has limited allelic diversity, and limited mapping resolution due to limited recombination events 12 . On the other hand, GWAS explores ancestral recombination in natural genetically diverse population to dissect complex traits 13 . GWAS is an improvement over QTL mapping in that it improves the resolution of QTLs due to accumulated meiotic events and reduces the time taken in developing mapping populations 14 . GWAS is a powerful tool for detecting QTLs associated with important complex quantitative traits, as well as predicting or identifying causative genes 15 .
Many statistical models have been developed to improve the power of identifying QTNs with the GWAS approach. The single-locus mixed-linear model (MLM) is the most common method used for GWAS. The method uses several algorithms such as the compressed MLM 16 , enriched MLM 17 , however, all these models perform single dimensional genome scan and require multiple correction. These models also have major limitations in mapping QTNs with small effects. Wang et al. 18 proposed a new model based on multi-locus random-SNPeffect MLM (mrMLM). The methods include polygenic-background control-based least angle regression plus empirical Bayes (pLARmEB), fast multi-locus random-SNP-effect efficient mixed model association (FASTm-rEMMA), iterative-sure independence screening expectation-maximization (EM)-Bayesian LASSO (ISIS EM-BLASSO) and fast multi-locus random-SNP-effect mixed linear model (FASTmrMLM) [18][19][20][21][22] . These methods can effectively detect small-effect QTNs and improve the efficiency and accuracy of GWAS. Recently, few studies have implemented the above GWAS methods to detect important loci controlling different traits in maize 23 .
Genome-wide association study (GWAS) for S. hermonthica resistance has been conducted in maize. Adewale et al. 24 identified 24 SNPs that were significantly associated with four Striga resistance-related traits in early maturing maize inbred lines using compressed MLM, these lines captured only the genetic variation existing in the extra early and early maturing maize germplasm developed at IITA. However, genomic regions governing Striga resistance in many intermediate and late-maturing maize inbred lines with consistent expression of polygenic resistance to S. hermonthica have not been identified. This germplasm offers an excellent resource for discovering functional genes underlying the genetic variation in the Striga resistance-related traits. This study was thus conducted; to evaluate diverse intermediate and late maturing maize inbred line for Striga resistance under Striga infested and non-infested conditions and identify genomic regions and candidate genes related to Striga resistance.

Results
Phenotypic diversity. In the combined analyses of variance (ANOVA), environments and lines had significant (p < 0.001) effects on the three Striga resistance-related traits under Striga infested conditions and for grain yield under non-infested condition (Tables 1 and 2). The line x environment interaction mean squares were also significant for most of the traits measured under the two conditions. Further assessment of the line x environment interaction using rank correlation analyses between pairs of environments found highly significant (p < 0.0001) correlations for the Striga resistance-related traits (Supplementary Table S1). The broad-sense heritability estimates for the Striga resistance-related traits varied from 81 to 85% (Supplementary Table S2).  Table S2). Relative to the non-infested condition, the resistant benchmark (9450) suffered 46% yield loss, whereas the susceptible benchmark (5057) suffered 77% yield loss indicating the inbred lines used in this study were exposed to severe Striga infection. In addition, 86 maize inbred lines supported significantly fewer emerged Striga plants at 8 and 10 weeks after planting (WAP). These lines, on average, suffered 19% yield reduction relative to the non-infested conditions, produced significantly higher grain yields than the resistant benchmark (9450), and were categorized as resistant lines. In contrast, 44 inbred lines supported as many Striga plants as the susceptible benchmark (5057) but produced significantly higher grain yields than the susceptible line, and were categorized as tolerant. The remaining 20 inbred lines, supported as many Striga plants as the susceptible benchmark and did not differ significantly from the susceptible line in their grain yields, and were regarded as susceptible. All paired traits showed statistically significant differences at p-value < 0.01. A negative correlation was observed between grain yield and Striga damage rating and emerged Striga plants at 8 and 10 WAP. However, there was a positive correlation between Striga damage rating and emerged Striga plants at 8 and 10 WAP (Supplementary Fig. S1).
Genotyping. Population structure and linkage disequilibrium. For the genotypic analysis, 16,735 SNPs distributed across the ten maize chromosomes were identified after the quality control process. A minimum of 1208 SNPs (7.2%) were mapped on chromosome 10, whereas a maximum of 2532 SNPs (15.2%) were mapped on chromosome 1. The Admixture analysis using tenfold cross-validation from k = 1 to k = 10 showed a sharp elbow at k = 3, indicating the inbred lines can be grouped into three subgroups (Fig. 1A,B). The principal component analysis (PCA) also grouped the inbred lines into three subgroups and this is consistent with the Admixture results. A scree plot generated to visualize the fraction of variance represented by each of the ten principal components showed that two (PC1 and PC2) explained the largest proportion (42.3%) of the total variance (Fig. 1C). Furthermore, the phylogeny tree clustering also grouped the inbred lines into three distinct subgroups; 22, 27 and 101, derived from ZDIP, IWDS and Mixed groups, respectively (Fig. 1D). The assignment of the inbred lines into the three subgroup based on the phylogeny tree were in good agreement (98%) with those revealed by PCA The inbred lines were grouped based on their genetic background/pedigree and maturity. The LD estimates (r 2 ) showed a slight increasing and then consistent pattern of LD decay was observed with an increase in the physical distance of SNP markers mapped on the 10 chromosomes ( Supplementary Fig. S2). The average linkage disequilibrium decay varied from 2.73 kb on chromosome 6 to 3.68 kb on chromosome 8 (Supplementary Table S3).
Genome-wide association analysis. The GWAS multiple-locus models used in this study identified 30 significant SNPs that were significantly associated with the three Striga resistance-related traits. These SNPs were distributed on all maize chromosomes but chromosome 8. The highest number of SNPs was found on Chromosome 1, and the least on chromosomes 3, 6 and 7 ( Table 3). The results of the Manhattan plot and the quantilequantile plots revealed reasonable data adjustment and a few significant SNPs above the interval of the expected values of the null hypothesis (Fig. 2). This study employed four multi-locus methods, mrMLM, FASTmrMLM, FASTmrEMMA, and pLARmEB to perform comprehensive GWA mapping in our diversity panel. Among the four methods mrMLM identified the highest number of SNPs (19) while, FASTmrEMMA identified the least (5). In addition, FASTmrMLM identified the most codetected SNPs among the four GWAS models used. Four SNPs were associated with grain yield under Striga infested condition. These SNPs are located on three chromosomes, and each SNP explained between 3.21 to 13.36% of the phenotypic variation. One of the SNPs (S4_164335765) associated with grain yield was detected by two GWAS multi-locus methods (FASTmrMLM and mrMLM) and they explained 6.7 and 13.4% of the phenotypic variation. The LOD score of the significant SNPs ranged from 7.18 (S4_164335765) to 7.39 (S10_1784894).
Eleven SNPs were associated with Striga damage rating at 8 WAP. These SNPs are located on seven chromosomes and the proportion of phenotypic variation explained by each SNP ranged from 0.14 to 14.19%. The LOD values of the identified SNPs ranged from 6.06 (S4_160459526) to 11.80 (S10_68324912). Nine SNPs were associated with Striga damage rating at 10 WAP. These SNPs were located on five chromosomes, and they explained 4.13 to 11.60% of the phenotypic variation. Four SNPs (S1_284192573, S1_10576247, S2_99667127, www.nature.com/scientificreports/ and S7_10795659) associated with Striga damage rating at 10 WAP were detected by two or more of the GWAS multi-locus (mrMLM, and FASTmrMLM, and FASTmrEMMA) methods.
The LOD values of the identified SNPs ranged from 6.02 (S2_190557148) to 13.76 (S2_188120710). In general, two SNPs (S2_160791711 and S10_25285761) were simultaneously associated with Striga damage rating at 8 and 10 WAP.
Eight SNPs were associated with emerged Striga plants at 8 WAP. These SNPs are located on five chromosomes and the proportion of phenotypic variation explained by each SNP ranged from 5.03 to 13.58%. Five SNPs (S2_208978140, S2_135038935, S5_148751913, S9_7727167, and S10_90133328) associated with emerged Striga plants at 8 WAP were detected by two or more of the GWAS multi-locus (mrMLM, FASTmrMLM, FAST-mrEMMA, and pLARmEB) methods. The LOD values of the identified SNPs ranged from 6.14 (S1_298950342) to 9.28 (S10_125571525). Four SNPs were associated with emerged Striga damage plants at 10 WAP. These SNPs are located on three chromosomes and the proportion of phenotypic variation explained by each SNP ranged from 6.33 to 12.21%. Three SNPs (S2_135038935, S5_148751913, and S5_204969099) associated with emerged Striga plants at 10 WAP were detected by two GWAS multi-locus (mrMLM, and FASTmrMLM) methods. The LOD values of the identified SNPs ranged from 6.83 (S2_135038935) to 8.65 (S5_148751913). In general, three SNPs (S2_135038935, S5_148751913, and S10_12557152) were simultaneously associated with emerged Striga plants at 8 and 10 WAP.
Under non-infested conditions, four SNPs were associated with grain yield (Table 4). These SNPs are located on three chromosomes and the proportion of phenotypic variation explained by each SNP ranged from 5.63 to 17.40%. Furthermore, one of the SNPs on chromosome 1 (S4_189154251) was detected by two of the GWAS multi-locus methods (mrMLM, and FASTmrMLM). The LOD score of these SNPs ranged from 6.52 (S1_26517984) to 9.72 (S4_189154251).
Markers effect prediction. The frequencies and marker prediction effects of various alleles associated with the three Striga resistance-related traits are presented in Table 5. Two of the SNPs (S4_164335765 and S9_1994432) associated with grain yield under Striga infestation displayed high segregation among the inbred lines. For SNPs on chromosome 4, alleles AA and CA were associated with genotypes with higher grain yield, while alleles CC were associated with lower grain yield. For the second SNP on chromosome 9, alleles GG were associated with genotypes with higher grain yield, while alleles AA were associated with lower grain yield (Fig. 3). For Striga damage ratings at 8 and 10 WAP, three SNPs (S1_284192573, S4_160459526, and S5_5623740) displayed high segregation among the inbred lines. For two of the SNPs on chromosomes 1 and 4, alleles TT and TG were asso-  Identifying putative genes. According to the genomic information of B73 Ref Gen_V4, thirty-one putative genes/proteins, including two uncharacterized proteins were found in the intervals adjacent to the significant SNPs detected for the three Striga hermonthica resistance-related traits (Table 6). Remarkably, most of the gene models identified encode transcription factors, disease resistance proteins, zinc-finger domain proteins, leucine-rich repeats protein kinase and some pathogenesis-related proteins. Most of the identified genes were located on chromosomes bins 1.10, 2.05, 2.06, 3.04, 6.01, 7.01. 9.01, 10.00. 10.01, and 10.03. Five gene models were identified around three SNPs associated with grain yield under Striga infestation. Two gene models each were found on chromosomes 4 and 9 and they encode adenylyltransferase, sulfurtransferase (MOCS3 2), U-box domain-containing protein 39, and NLR family CARD domain-containing protein 3. The www.nature.com/scientificreports/ remaining gene model on chromosome 10 encodes VQ proteins. These putative genes/proteins are mainly involved in developmental processes including responses to biotic and abiotic stress, seed development and photo-morphogenesis. The LD heat-map of significant SNPs (S9_1994432 and S10_1784894) identified on chromosomes 9 and 10 were highly correlated (0.5 to 0.8) with regions adjacent to the identified putative genes (U-box domain-containing protein 39, NLR family CARD domain-containing protein 3 and VQ proteins) (Supplementary Fig. S3). Nineteen gene models were identified around seventeen SNPs associated with Striga damage ratings at 8 and 10 WAP. Two gene models each were associated with SNPs S10_25285761 and S10_2743583 located on chromosome 10. These gene models encode leucine-rich repeat extension-like protein, disease resistance protein RPM1, disease resistance RPP13-like protein1 and an uncharacterized protein. Other genes models associated with Striga damage ratings encode putative cytochrome P450 superfamily protein, xyloglucan endotransglycosylase, basic helix-loop-helix (bHLH7 and bHLH20) transcription factors, knotted related homeobox 5, ubiquitinprotein ligase, and zinc-finger domain proteins. Most of these genes/proteins identified are involved in different development and plant defense mechanism. Plant resistance genes allow plants recognize the presence of specific pathogens and initiate defense responses. The LD heat-map of significant SNPs (S5_70442824, S7_10795659 and S10_25285761) identified on chromosomes 5, 7 and 10 were highly correlated (0.5 to 0.8) with regions adjacent  www.nature.com/scientificreports/  www.nature.com/scientificreports/ to the identified putative genes (putative cytochrome P450 superfamily protein, Transcription factor bHLH7, uncharacterized LOC100381459 and leucine-rich repeat extensin-like protein 3) (Supplementary Fig. S4). Nine gene models were identified around nine SNPs associated with emerged Striga plants at 8 and 10 WAP. These gene models encode Dof zinc finger protein, protein accelerated cell death 6, peroxidase 70, hapless, basicdomain leucine-zipper (bZIP46) and WRKY14 transcription factor. Transcription factors are usually involved in diverse plant processes including, growth, development and stress signaling. In addition, protein accelerated cell death is a positive regulator of programmed cell death and it is a mechanism used by plants for defense against pathogen infection. The LD heat-map of significant SNPs (S3_74335447 and S9_7727167) identified on chromosomes 3 and 9 were highly correlated (0.4 to 0.8) with regions adjacent to the putative genes (uncharacterized LOC100382572, bZIP transcription factor 46) (Supplementary Fig S5).  www.nature.com/scientificreports/ Under non-infested conditions, six gene models were identified around four SNPs associated with grain yield ( Table 7). Most of the identified genes were located on chromosomes bins 1.02, 4.08 and 8.02. These gene models encode IQ domain-containing protein IQM5, uncharacterized protein LOC100277298, WRI1 transcription factor 2, ENTH/VHS family protein, Replication protein A 70 kDa DNA-binding and Photosystem I reaction center subunit XI. IQ-domain proteins are common in land plants and they are known for critical roles in host defense, cell shaping and drought resistance.

Discussion
The marked reduction in grain yield observed in the resistant and susceptible benchmark indicates the occurrence of severe parasite infestation across the test environments, eliciting significant differences in resistance or susceptibility reactions among the inbred lines. The diversity panel used in our study displayed considerable phenotypic variation for the three Striga resistance-related traits recorded under Striga infestation, and this is consistent with the findings in other studies 25 . The significant line x environment interaction observed for traits measured under Striga infestation can be attributed to varying seasonal factors, soil pH, and nutrient levels 26 . Also, the significant rank correlations among environments for the major Striga resistance-related traits indicates that the lines maintained consistent resistance or susceptibility reactions to Striga seeds collected from different locations and years. More than 55% of the lines evaluated in this study were resistant to S. hermonthica, and this is due to the severe selection pressure imposed by the breeders during the development of these inbred lines from diverse source populations.
Heritability estimates were high for the Striga resistance-related traits, indicating the predominant role of genetic factors for these traits. Traits with high heritability increase the power of detecting SNPs in an association panel and thus allow the identification of a true association between a marker and a putative gene 15 . The high heritability estimates observed for Striga damage ratings and emerged Striga plants in this study is consistent is consistent with the results reported by Najar et al. 27 and Shayanowako et al. 25 . These findings, however, differ from those of Badu-Apraku et al. 28 , who recorded low heritability estimates (h 2 < 50) for emerged Striga plants and Striga damage ratings. .
The efficiency of association mapping largely depends on population size and population structure, which infers the ancestry of lines based on their genotypic information 29 . The diversity panel used in this study are inbred lines derived from broad-based populations containing tropical and temperate germplasm, backcrosses containing Z. diploperennis adapted to tropical environments, and some lines that are tolerant to drought 30 . The two complementary approach used to infer the population structure grouped the inbred lines into three subpopulations based on their genetic backgrounds/pedigree and maturity information. It is worth noting that there was high agreement in the assignment of the inbred lines into the three subgroups based on the two approaches (Admixture and PCA). The phylogeny tree also grouped the inbred lines into three subgroups.
LD is an important factor that determines the power of marker-trait association analysis. In this study, more than 60% of the SNP pairs across the genome exhibit LD at r 2 > 0.1. In addition, the high r 2 value observed on chromosomes 4 and 8 in this study can be attributed to fewer recombination events on these chromosomes; this is consistent with the findings of Thirunavukkarasu et al. 31 and Dinesh et al. 32 , who reported high r 2 value on chromosome 4 and 8 of maize. Reports have shown that the effectiveness of recombination is limited by the high level of homozygosity 33 . In this study, faster decay of LD with increasing distance between markers was observed, which agrees with Doa et al. 34 and Dinesh et al. 32 .
Several studies have been conducted to dissect the genetic architecture of Striga resistance in maize, and many QTLs associated with Striga resistance have been detected using bi-parental populations [9][10][11] . Badu-Apraku et al. 10 identified 116 QTLs associated with four Striga resistance-related traits (grain yield, Striga damage ratings, ear aspect and emerged Striga plants) using bi-parental (F 2:3 ) population derived from two early maturing maize lines. In another study, Badu-Apraku et al. 9 identified 14 QTLs that were associated with three Striga resistance-related traits (grain yield, ears per plant and Striga damage rating at 10 WAP). However, QTL mapping only exploits only a small fraction of available genetic diversity and exhibits limited capacity to detect polygenic resistance 12 . However, only a few GWAS has been conducted to identify genomic regions associated with Striga hermonthica resistance in maize. Genome-wide association study exhibits high mapping resolution and abundant genetic variation due to the high ancestral recombination events in natural populations 35 . Thus, it has been identified as a useful tool for detecting QTNs associated with complex quantitative traits, as well as predicting or identifying causative genes 15 .
Different statistical models have been used for GWAS, the multi-locus model exhibits a higher distinctive power and a lower false-positive rate for detecting QTNs compared with the single-locus GWAS model 18,36 . The adjustments of single-locus GWAS model improves its detection accuracy to an extent, however, the multipletesting correction (Bonferroni correction) of significance thresholds for single-locus model is too strict. This leads to the exclusion of important loci, especially when large experimental errors occur in field trials of crop genetics. To solve this problem, the application of multi-locus mrMLM is essential. Previous study on GWAS have used the single-locus model to identify quantitative trait nucleotides (QTNs) controlling Striga resistance in maize. Adewale et al. 24 identified 24 SNPs associated with four Striga resistance-related traits in early maturing maize inbred lines using single-locus GWAS model. Further annotation analysis identified three putative genes that explained 9 to 42% of the phenotypic variation. The high phenotypic variation explained can be attributed to the single-locus GWAS model used. However, the genomic regions identified by Adewale et al. 24 differs from those discovered in this study.
In this study, the four multi-locus GWAS models used identified thirty significant SNPs associated the three Striga resistance-related traits. This study is the first to use multi-locus GWAS model to identify SNPs associated with S. hermonthica resistance in maize. Comparing our GWAS results with previous studies on S. hermonithica www.nature.com/scientificreports/ resistance in maize [9][10][11] , there were no similar genomic regions detected, however, additional genomic regions associated with Striga hermonthica resistance were identified. The annotation analysis identified gene models with potential involvement in plant growth, development, and defense mechanism. Intriguingly, some of the gene models identified encode transcription factors (TFs) including WRKY14, basic helix-loop-helix (bHLH7), bHLH20, basic-domain leucine-zipper (bZIP46), JUNGBRUNNEN 1 and zinc finger proteins. Transcription factors (TFs) are critical regulators of gene expression in all living organisms. They are involved in plant development, cell signaling, and plant defense response. Studies have shown that most WRKY TFs respond to pathogen attack and act as both positive and negative regulators in complex defense response network 37 . Studies have associated WRKY TFs with S. hermonthica resistance mechanism in rice. Swarbrick et al. 38 reported the up-regulation of genes encoding WRKY TFs in the roots of Nipponbare, a rice variety with resistance to S. hermonthica. Also, Mutuku et al. 39 reported the knockdown of WRKY45 (WRKY45-kd) by RNA interference in rice plants resulted in susceptibility to S. hermonthica infestation. bHLH is another TF, they are commonly expressed in response to drought stress and they have been reported in rice 40 . The bHLH family TFs in Populus, PebHLH35 from Populus euphratica, has been reported as an essential gene in response to drought by regulating stomatal development and photosynthesis in Arabidopsis 41 .
In addition, the Ring zinc-finger domain superfamily proteins are the most significant TFs known for their finger-like structure and ability to bind to zinc. These proteins have been reported in plants such as wheat (Triticum aestivum), soybean (Glycine max), and rice (Oryza sativa) 42 . Cao et al. 43 indicated that Ring zinc-finger domain superfamily proteins are involved in resistance to blast fungus infection in rice. The DNA binding with one finger proteins (dofs) also regulate the expression of genes involved in plant development and defense processes 44 . In maize, ZmDof1 has been isolated and connected with C4 photosynthesis 45 , which makes it thrive more than the C3 plants under warmer harsh climates because they are known to be drought resilient. In maize, there are no information on the roles of most of these TFs in S. hermonthica resistance; thus, further transcriptomic study will give a better understanding on the role of these TFs in S. hermonthica resistance in maize.
Plants generally lack specific cells to defend themselves against attack, but they possess the necessary components for detecting invasion and building up defense response. Xyloglucan endotransglucosylase/hydrolases (XTHs) are cell wall enzymes that are able to graft xyloglucan chains to oligosaccharides 46 . One of its functions in plants is defense reaction against parasitic plants 47 . In tomato, xyloglucan endotransglycosylase/hydrolase (XTH) plays a major role in defense reactions against plant parasite Cuscuta reflexa 47 .
Plants have evolved a series of mechanisms to resist pathogens infection. Most plant disease resistance (R) genes contain nucleotide-binding site (NBS) and leucine-rich repeat (LRR) domains. NBS domains could bind and hydrolyze ATP or GTP, while LRR domains are critical for the formation of protein-protein interactions 48 . NBS-LRR proteins have been suggested as the largest class of known R proteins that can either directly or indirectly recognize the presence of pathogens 49 . R gene proteins are involved in pathogen detection and disease resistance 50 . The Recognition of Peronospora Parasitica 13-like (RPP13-like) genes also play important roles in the resistance of various plant diseases including the downy mildew caused by Peronospora parasitica. In Arabidopsis, the RPP13-Nd, cloned from an ecotype (Niederzenz (Nd-1)), was characterized to resist the infection of various isolates of P. parasitica 51 .
Leucine-rich repeat receptor-like protein kinases (LRR-RLKs) are the largest group of receptor-like kinases in plants and play vital roles in development and defense-related processes including cell proliferation, hormone perception, host-specific defense response, wounding response and symbiosis 52 . In Arabidopsis two LRR were identified to regulate cell death and innate immunity 53 . According to Yuan et al. 54 , LRR-RLK can positively regulate plant biotic resistance but negatively regulate plant abiotic tolerance in Arabidopsis . Interestingly, several RLKs were found to possess dual or multiple roles during plant growth and development. For example, ERECTA is involved in both plant development and pathogen defense responses 55 .
U-box proteins significantly contribute to the ability of plants to respond to diverse environmental stresses, due to plant immobility 56 . The ubiquitination pathway regulates growth, development, and stress responses in plants, and the U-box protein family of ubiquitin ligases plays important role in this pathway. In higher plants, U-box-ARM proteins are associated with regulation of cell death and defense 57 . In addition, VQ proteins regulate diverse developmental processes, including responses to biotic and abiotic stresses, seed development, and photomorphogenesis 58 . Members of the VQ family either play a positive or negative role in plant immune response. Plants with loss of function of VQ23 lack resistance to both Botrytis cinerea and Pseudomonas syringae. While lines, which overexpresses VQ23, showed reduced disease symptoms upon infection with either pathogen 58 .
Cytochrome P450 superfamily proteins were also associated with emerged Striga plants, they are often involved in phytoalexin synthesis and the scavenging of toxins. Plants utilize a wide array of cytochrome P450 monoxygenases (P450s) in biosynthetic and detoxification pathways. Several genes encoding P450s were observed to be highly up-regulated during the resistance response to S. hermonthica in rice 38 . From this study, it was observed that the genomic regions controlling grain yield under Striga infestation differs from the genomic region controlling grain yield under non-infested condition. Genomic regions identified to be associated with plant defense mechanisms will be developed into kompetitive allele-specific PCR (KASP) genotyping assay and this will be validated in independent populations to improve Striga resistance breeding in maize before deployment for use in marker-assisted selection. Identified SNPs will also help expedite the use of molecular markers in Striga resistance improvement through the use of marker-assisted backcrossing (MABC) to advance the effectiveness of breeding for superior and desirable qualities but susceptible to Striga infestation.
In conclusion, most of the significant SNPs discovered in this study encode genes associated with plant defense mechanism. Most of the QTLs identified have not been documented in maize, indicating they are novel and are addition to the already identified QTLs for Striga resistance in maize from other studies. QTNs identified in this study can be potentially used to expedite the use of marker-assisted selection (MAS) in breeding for durable  Ethylene gas was sprayed two weeks before planting to induce suicidal germination of Striga seeds in the soil. Two maize seeds were planted in a 6 cm deep hole inoculated with 8.5 g of sand mixed with Striga seeds. The sand-Striga mixture contains approximately 3000 germinable Striga seeds. The Striga seeds used in this study were collected from sorghum field from the previous planting season in Mokwa and Abuja with farmers' consent before usage. Two weeks after planting, maize plants were thinned to one plant per hill to attain a population density of 53,333 plant/ha. Fertilizer was applied at the rate of 30 kg/ha of nitrogen, 60 kg/ha each of phosphorus and potassium at planting and an additional 30 kg/ha nitrogen was applied four weeks later. Weeds other than Striga were removed from plots manually throughout the planting season. Data were taken under both infested and non-infested conditions, except for Striga damage score and Striga emergence, recorded only under Striga infestation. Data recorded under the two environments included plant height, ear aspect, and grain yield (Supplementary Table S5). Ears were collected separately from each line and shelled to estimate per cent moisture in the grain. Grain yield was then calculated from grain weight adjusted for 15% moisture. This study is geared towards improving IITA maize breeding, and it complies with the country's local and national regulations. Data analysis. Phenotypic analysis. Analysis of variance combined across the four year-location, which were hereafter referred to as environments, were computed for all traits measured under Striga infested and noninfested conditions based on a mixed-model analysis with restricted maximum likelihood procedure in SAS version 9.4 59 . In this analysis, genotypes were considered fixed while all other factors were random.
Separate analyses were conducted for traits measured under infested and non-infested conditions. The mixed model analysis generated the best linear unbiased estimates (BLUEs), the variance components and broad-sense heritability estimates. In addition, Spearman rank correlations between pairs of environments were computed for the Striga resistance-related traits using SAS version 9.4. Also, correlation analysis among the different traits was performed using R software, and results were displayed as heat map.
Genotyping and filtering. Genomic DNA was extracted from young leaf samples of the 150 maize inbred lines using the modified cetyltrimethylammonium bromide (CTAB) protocol 60 . Purified DNA was sent to Elshire facility in New Zealand for genotyping-by-sequencing (GBS) 61 . Genomic DNA was digested with the restriction enzyme (ApeK1), and genotyping-by-sequencing (GBS) of the libraries were constructed in 96-plex and sequenced on Illumina HiSeq2500 following manufacturer's protocol. Raw flow cell output was processed to genotype calls using the TASSEL-GBS pipeline. Reads and tags found in each sequencing result were aligned with the Zea mays L. genome reference, version AGPV3 (B73 Ref Gen v4 assembly). SNPs with minor allele www.nature.com/scientificreports/ frequency (MAF) of < 0.05 and missing rate of > 10% were excluded from the genotyping dataset using PLINK 1.9 beta 62 .
Population structure and linkage disequilibrium. To explore the genetic relationship among the inbred lines, principal component analysis (PCA) was conducted using factorMiner package in R 63 . The pairwise genetic distance was calculated through identity-by-state (IBS), and the phylogenetic tree was generated using analysis of phylogenetics and evolution (ape) R package 64 . The population stratification among the inbred lines was assessed using Admixture software 65 . The method uses maximum likelihood estimation on data from many loci to estimate individual ancestries among the inbred lines. The analysis was performed using a cross-validation error (k) varying from 2 to 10. The most appropriate k-value selected exhibit low cross-validation error compared to other k-values. LD among markers was calculated using PLINK software. The window size for LD calculation was set based on the number of SNPs located in the genome. Pairwise linkage disequilibrium was measured using the squared allele frequency correlations, according to Weir,66 , and assessed by calculating r 2 for pairs of SNP loci.
Marker-trait association analysis. All the phenotypic and genotypic information from the 150 diverse maize inbred lines were used to detect SNPs using four of the GWAS multi-locus models, multi-locus random-SNPeffect MLM (mrMLM), fast multi-locus random-SNP-effect mixed linear model (FASTmrMLM), fast multi-locus random-SNP-effect efficient mixed model association (FASTmrEMMA) and polygenic-background-controlbased least angle regression empirical Bayes (pLARmEB), implemented in mrMLM v4.0 (https:// cran.r-proje ct. org/ web/ packa ges/ mrMLM. GUI/ index. html). The unified parameter settings for the four methods were as follows; the Q + K model was used, where the population structure value Q was calculated by Admixture software 65 and the kinship value K was analyzed by the "mrMLM" software package. The limit of detection (LOD) score was set to 6 for robust QTNs for all measured trait. The Manhattan and QQ plots for GWAS were displayed using the R package CMplot.
Gene annotation. SNPs detected for Striga hermonthica resistance-related traits by the four mrMLM methods were mapped to the maize reference genome B73 RefGen_V4 to identify associated candidate genes. The genes corresponding to each QTN was determined in MaizeGDB according to its physical position. The functional annotations of the candidate genes were predicted in NCBI. The Pairwise LD estimates in the region of interest for significantly associated markers were investigated using Haploview 4.2. Finally, LD plotting was done based on base pairs (bp) distance, using "ggplot2" package in R 67 .
Marker effect prediction and variants comparison. Variants (ref/alt) associated with significant SNPs were identified using rstatix package implemented in R, and their effect were compared using ANOVA p < 0.05. The nature of the SNP marker (favorable and unfavorable) was defined based on the direct contribution to the traits using rstatix and visualized using ggplot2.

Data availability
Phenotypic data presented are within this document and the genotypic data can be provided upon request.